Estimation of Distribution Algorithms (EDA), sometimes called Probabilistic Model-Building Genetic Algorithms (PMBGA), are an outgrowth of genetic algorithms. In a genetic algorithm, a population of candidate solutions to a problem is maintained as part of the search for an optimum solution. This population is typically represented explicitly as an array of objects. Depending on the specifics of the GA, the objects might be bit strings, vectors of real numbers, LISP style S expressions or some custom representation. In an EDA, this explicit representation of the population is replaced with a probability distribution over the choices available at each position in the vector that represents a population member.
For example, if the population is represented by bit strings of length 4, the EDA for the populations would be a single vector of four probabilities (p1, p2, p3, p4) where each p is the probability of that position being a 1. Using this probability vector it is possible to create an arbitrary number of candidate solutions.
In evolutionary computation new candidate solutions are often generated by combining and modifying existing solutions in a stochastic way. The underlying probability distribution of new solutions over the space of possible solutions is usually not explicitly specified. In EDAs a population may be approximated with a probability distribution and new candidate solutions can be obtained by sampling this distribution. This may have several advantages, including avoiding premature convergence and being a more compact representation.
Better-known EDAs include
The model may be found to fit an existing population or take on the role of the population entirely. Once the model is obtained, it can be sampled to produce more candidate solutions which are then used to adapt or regenerate the model. EDAs are typically classified according to the level of variable interaction that their probabilistic model includes - they can be classed as univariate (no interactions), bivariate (interactions between pairs of variables) or multivariate (interactions between more than two variables) (Pelikan, 1999).